Introducing the LCC Metaphor Datasets
نویسندگان
چکیده
In this work, we present the Language Computer Corporation (LCC) annotated metaphor datasets, which represent the largest and most comprehensive resource for metaphor research to date. These datasets were produced over the course of three years by a staff of nine annotators working in four languages (English, Spanish, Russian, and Farsi). As part of these datasets, we provide (1) metaphoricity ratings for within-sentence word pairs on a four-point scale, (2) scored links to our repository of 114 source concept domains and 32 target concept domains, and (3) ratings for the affective polarity and intensity of each pair. Altogether, we provide 188,741 annotations in English (for 80,100 pairs), 159,915 annotations in Spanish (for 63,188 pairs), 99,740 annotations in Russian (for 44,632 pairs), and 137,186 annotations in Farsi (for 57,239 pairs). In addition, we are providing a large set of likely metaphors which have been independently extracted by our two state-of-the-art metaphor detection systems but which have not been analyzed by our team of annotators.
منابع مشابه
Linear centralization classifier
A classification algorithm, called the Linear Centralization Classifier (LCC), is introduced. The algorithm seeks to find a transformation that best maps instances from the feature space to a space where they concentrate towards the center of their own classes, while maximimizing the distance between class centers. We formulate the classifier as a quadratic program with quadratic constraints. W...
متن کاملLarge-scale Dictionary Learning For Local Coordinate Coding
Dictionary learning is a method to learn dictionary items adapted to data of a given distribution. It is shown that dictionary learned from data is more suited for vision task than universal dictionaries [4]. Traditionally, Vector Quantization (VQ), or using k-means to learn data cluster centroids, is a simple and popular method in the bag-of-features framework [5]. Recently, sparse coding is u...
متن کاملتأثیر استفاده از ابزار LCC در افزایش راندمان مصرف کود نیتروژن در شالیزار
Nitrogen use efficiency is relatively low in irrigated rice fields because of rapid N losses from ammonia volatilization, the nitrification, surface runoff, and leaching in the soil-flood water system. Since the plant N represents the total N supply of all sources, plant N status will be a good indicator of N availability to crops at any given time. Leaf colour chart (LCC) is a simple portable ...
متن کاملLCC-Demons: A robust and accurate symmetric diffeomorphic registration algorithm
Non-linear registration is a key instrument for computational anatomy to study the morphology of organs and tissues. However, in order to be an effective instrument for the clinical practice, registration algorithms must be computationally efficient, accurate and most importantly robust to the multiple biases affecting medical images. In this work we propose a fast and robust registration frame...
متن کاملModelling of Trends in Twitter Using Retweet Graph Dynamics
In this paper we model user behaviour in Twitter to capture the emergence of trending topics. For this purpose, we first extensively analyse tweet datasets of several different events. In particular, for these datasets, we construct and investigate the retweet graphs. We find that the retweet graph for a trending topic has a relatively dense largest connected component (LCC). Next, based on the...
متن کامل